Toronto Crime Predictions¶


Table of Contents¶

  1. Introduction
    1. Project Overview
    2. Tools & Technologies Used
    3. Data Sources
  2. Retrieve Data from API
  3. Preprocess the Data
  4. Create the Model
    1. Creating the Testing and Training Datasets - First Approach
    2. Testing Different Models - First Approach
    3. Optimizing the models - First Approach
      1. Perform Hyper-Parameter Tuning on the Models
        1. Iterations for Random Forest (RF) Regressor
        2. Iterations for Histogram-Based Gradient Boosting (HBGB)
        3. Iterations for K-Nearest Neighbors (KNN) Regressor
      2. Apply Boosting Algorithms
    4. Creating the Testing and Training Datasets - Second Approach
    5. Testing Different Models - Second Approach
      1. Predicting Total Counts
      2. Predicting Assault Counts
      3. Predicting Auto Theft Counts
      4. Predicting Break and Enter Counts
      5. Predicting Robbery Counts
      6. Predicting Theft Over Counts
    6. Optimizing the models - Second Approach
      1. Auto Theft Models
        1. Apply Regression Chain Boosting Algorithm on RF Regressor
        2. Perform Hyper-Parameter Tuning on RF Regression Chain
        3. Apply ADA Boosting Algorithm on HBGB
        4. Perform Hyper-Parameter Tuning on ADA Boosted HBGB Regressor
      2. Total Count Models
        1. Apply Regression Chain Boosting Algorithm on RF Regressor
        2. Perform Hyper-Parameter Tuning on RF Regression Chain
        3. Apply ADA Boosting Algorithm on HBGB
        4. Perform Hyper-Parameter Tuning on ADA Boosted HBGB Regressor
        5. Perform Hyper-Parameter Tuning on RF Regressor
        6. Perform Hyper-Parameter Tuning on HBGB Regressor
        7. Create a Voting Ensemble Learning Model with the default RF and HBGB Models
          1. Final Model
  5. Results
    1. Visualizations based on current data
      1. Total count of Criminal Incidents for past three years
      2. Crime Statistics for Previous Years with a Month Breakdown
      3. Crime Statistics for 2021
      4. Crime Statistics for 2022
      5. Crime Statistics for 2023
      6. Comparison of Actual Data and the Predicted Data of 2024
    2. Visualizations based on the Predictions
      1. Anticipated Crime Statistics for next six months of 2024
      2. Anticipated Total count of Crime Acitivities for upcoming three years
      3. Anticipated Crime Statistics for Upcoming Years with a Month Breakdown
      4. Anticipated Crime Statistics for 2025
      5. Anticipated Crime Statistics for 2026
      6. Anticipated Crime Statistics for 2027
  6. Summary and Conclusion

Introduction¶

Project Overview¶

This project focuses on analyzing patterns or trends in criminal acitivity data from 2021 to 2024 obtained from the Major Crime Indicators (MCI) dataset in Toronto. The objective is to make predictions on the future patterns of criminal activity based on the patterns present in the current data.

Tools & Technologies Used¶

Data Retrieval:

  • Requests (Python) for making API calls to retrieve the data for analysis and visualization purposes.

Data Manipulation:

  • Pandas for majority of data cleaning, preprocessing, and manipulation steps.
  • GeoPandas for preprocessing and manipulating GeoJSON data.

Machine Learning:

  • Scikit-learn for applying regression algorithms (e.g., Random Forest Regressor, Histogram-Based Gradient Boosting, etc.) and ensemble learning and optimization methods (e.g., Voting Regressor, ADA Boosting, Hyper-Parameter Tuning, etc.) to predict future crime trends.

Data Visualization:

  • Matplotlib and Seaborn (Python) for visualizing crime trends over time.
  • Folium (Python) for visualizing geospatial data.

Data Sources¶

Toronto Police Service Major Crime Indicators (MCI) Data:

This dataset includes all Major Crime Indicators (MCI) occurrences by reported date and related offences since 2014. The Major Crime Indicators categories include Assault, Break and Enter, Auto Theft, Robbery and Theft Over (Excludes Sexual Violations).

This data is provided at the offence and/or victim level, therefore one occurrence number may have several rows of data associated to the various MCIs used to categorize the occurrence.

This data does not include occurrences that have been deemed unfounded. The definition of unfounded according to Statistics Canada is: “It has been determined through police investigation that the offence reported did not occur, nor was it attempted” (Statistics Canada, 2020).**

The location of crime occurrences have been deliberately offset to the nearest road intersection node to protect the privacy of parties involved in the occurrence...Due to the offset of occurrence location, the numbers by...Neighbourhood may not reflect the exact count of occurrences reported within these [neighbourhoods].

-- Toronto Police Services Public Safety Data Portal

map view

  • Further information on the dataset available here.
  • The data is accessed by an ArcGIS REST API.

Neighbourhoods Data:

  • City-of-Toronto-designated social planning neighbourhood boundaries defined primarily to help City staff collect data, plan, analyze and forecast City services.
  • Further information available here.

Retrieved neighbourhood boundary coordinate data from GeoJSON file:

geometry _id AREA_ID AREA_ATTR_ID PARENT_AREA_ID AREA_SHORT_CODE AREA_LONG_CODE AREA_NAME AREA_DESC CLASSIFICATION CLASSIFICATION_CODE OBJECTID
0 MULTIPOLYGON (((-79.38635 43.69783, -79.38623 ... 1 2502366 26022881 0 174 174 South Eglinton-Davisville South Eglinton-Davisville (174) Not an NIA or Emerging Neighbourhood NA 17824737.0
1 MULTIPOLYGON (((-79.39744 43.70693, -79.39837 ... 2 2502365 26022880 0 173 173 North Toronto North Toronto (173) Not an NIA or Emerging Neighbourhood NA 17824753.0
2 MULTIPOLYGON (((-79.43411 43.66015, -79.43537 ... 3 2502364 26022879 0 172 172 Dovercourt Village Dovercourt Village (172) Not an NIA or Emerging Neighbourhood NA 17824769.0
3 MULTIPOLYGON (((-79.4387 43.66766, -79.43841 4... 4 2502363 26022878 0 171 171 Junction-Wallace Emerson Junction-Wallace Emerson (171) Not an NIA or Emerging Neighbourhood NA 17824785.0
4 MULTIPOLYGON (((-79.38404 43.64497, -79.38502 ... 5 2502362 26022877 0 170 170 Yonge-Bay Corridor Yonge-Bay Corridor (170) Not an NIA or Emerging Neighbourhood NA 17824801.0
... ... ... ... ... ... ... ... ... ... ... ... ...
153 MULTIPOLYGON (((-79.59037 43.73401, -79.58942 ... 154 2502213 26022728 0 001 001 West Humber-Clairville West Humber-Clairville (1) Not an NIA or Emerging Neighbourhood NA 17827185.0
154 MULTIPOLYGON (((-79.51915 43.77399, -79.51901 ... 155 2502212 26022727 0 024 024 Black Creek Black Creek (24) Neighbourhood Improvement Area NIA 17827201.0
155 MULTIPOLYGON (((-79.53225 43.73505, -79.52938 ... 156 2502211 26022726 0 023 023 Pelmo Park-Humberlea Pelmo Park-Humberlea (23) Not an NIA or Emerging Neighbourhood NA 17827217.0
156 MULTIPOLYGON (((-79.52813 43.74425, -79.52721 ... 157 2502210 26022725 0 022 022 Humbermede Humbermede (22) Neighbourhood Improvement Area NIA 17827233.0
157 MULTIPOLYGON (((-79.53396 43.76886, -79.53227 ... 158 2502209 26022724 0 021 021 Humber Summit Humber Summit (21) Neighbourhood Improvement Area NIA 17827249.0

158 rows × 12 columns

Extracted the columns with neighbourhood names and boundary coordinates:

geometry NEIGHBOURHOOD_158
0 MULTIPOLYGON (((-79.38635 43.69783, -79.38623 ... South Eglinton-Davisville (174)
1 MULTIPOLYGON (((-79.39744 43.70693, -79.39837 ... North Toronto (173)
2 MULTIPOLYGON (((-79.43411 43.66015, -79.43537 ... Dovercourt Village (172)
3 MULTIPOLYGON (((-79.4387 43.66766, -79.43841 4... Junction-Wallace Emerson (171)
4 MULTIPOLYGON (((-79.38404 43.64497, -79.38502 ... Yonge-Bay Corridor (170)
... ... ...
153 MULTIPOLYGON (((-79.59037 43.73401, -79.58942 ... West Humber-Clairville (1)
154 MULTIPOLYGON (((-79.51915 43.77399, -79.51901 ... Black Creek (24)
155 MULTIPOLYGON (((-79.53225 43.73505, -79.52938 ... Pelmo Park-Humberlea (23)
156 MULTIPOLYGON (((-79.52813 43.74425, -79.52721 ... Humbermede (22)
157 MULTIPOLYGON (((-79.53396 43.76886, -79.53227 ... Humber Summit (21)

158 rows × 2 columns

Extracted the numeric code of the neighbourhoods and placed it in a seperate for labeling purposes:

geometry NEIGHBOURHOOD_158 HOOD_158
0 MULTIPOLYGON (((-79.38635 43.69783, -79.38623 ... South Eglinton-Davisville (174) 174
1 MULTIPOLYGON (((-79.39744 43.70693, -79.39837 ... North Toronto (173) 173
2 MULTIPOLYGON (((-79.43411 43.66015, -79.43537 ... Dovercourt Village (172) 172
3 MULTIPOLYGON (((-79.4387 43.66766, -79.43841 4... Junction-Wallace Emerson (171) 171
4 MULTIPOLYGON (((-79.38404 43.64497, -79.38502 ... Yonge-Bay Corridor (170) 170
... ... ... ...
153 MULTIPOLYGON (((-79.59037 43.73401, -79.58942 ... West Humber-Clairville (1) 1
154 MULTIPOLYGON (((-79.51915 43.77399, -79.51901 ... Black Creek (24) 24
155 MULTIPOLYGON (((-79.53225 43.73505, -79.52938 ... Pelmo Park-Humberlea (23) 23
156 MULTIPOLYGON (((-79.52813 43.74425, -79.52721 ... Humbermede (22) 22
157 MULTIPOLYGON (((-79.53396 43.76886, -79.53227 ... Humber Summit (21) 21

158 rows × 3 columns

Toronto Neighbourhood Boundaries Map:

Make this Notebook Trusted to load map: File -> Trust Notebook

Retrieve Data from API¶

back to the top

Extracting crime data from API in batches of 2000 entries, which is the transfer limit, per API call.

Previewing JSON array structure:

[{'type': 'Feature',
  'id': 246675,
  'geometry': {'type': 'Point',
   'coordinates': [-79.425761926, 43.6817690130001]},
  'properties': {'OBJECTID': 246675,
   'EVENT_UNIQUE_ID': 'GO-20213605',
   'REPORT_DATE': 1609477200000,
   'OCC_DATE': 1609477200000,
   'REPORT_YEAR': 2021,
   'REPORT_MONTH': 'January',
   'REPORT_DAY': 1,
   'REPORT_DOY': 1,
   'REPORT_DOW': 'Friday    ',
   'REPORT_HOUR': 16,
   'OCC_YEAR': 2021,
   'OCC_MONTH': 'January',
   'OCC_DAY': 1,
   'OCC_DOY': 1,
   'OCC_DOW': 'Friday    ',
   'OCC_HOUR': 16,
   'DIVISION': 'D13',
   'LOCATION_TYPE': 'Parking Lots (Apt., Commercial Or Non-Commercial)',
   'PREMISES_TYPE': 'Outside',
   'UCR_CODE': 2135,
   'UCR_EXT': 210,
   'OFFENCE': 'Theft Of Motor Vehicle',
   'MCI_CATEGORY': 'Auto Theft',
   'HOOD_158': '094',
   'NEIGHBOURHOOD_158': 'Wychwood (94)',
   'HOOD_140': '094',
   'NEIGHBOURHOOD_140': 'Wychwood (94)',
   'LONG_WGS84': -79.42576192637651,
   'LAT_WGS84': 43.68176901263976}},
 {'type': 'Feature',
  'id': 246676,
  'geometry': {'type': 'Point',
   'coordinates': [5.6843418860808e-14, 5.08888749034163e-14]},
  'properties': {'OBJECTID': 246676,
   'EVENT_UNIQUE_ID': 'GO-20213400',
   'REPORT_DATE': 1609477200000,
   'OCC_DATE': 1609477200000,
   'REPORT_YEAR': 2021,
   'REPORT_MONTH': 'January',
   'REPORT_DAY': 1,
   'REPORT_DOY': 1,
   'REPORT_DOW': 'Friday    ',
   'REPORT_HOUR': 16,
   'OCC_YEAR': 2021,
   'OCC_MONTH': 'January',
   'OCC_DAY': 1,
   'OCC_DOY': 1,
   'OCC_DOW': 'Friday    ',
   'OCC_HOUR': 4,
   'DIVISION': 'D33',
   'LOCATION_TYPE': 'Other Commercial / Corporate Places (For Profit, Warehouse, Corp. Bldg',
   'PREMISES_TYPE': 'Commercial',
   'UCR_CODE': 2135,
   'UCR_EXT': 210,
   'OFFENCE': 'Theft Of Motor Vehicle',
   'MCI_CATEGORY': 'Auto Theft',
   'HOOD_158': 'NSA',
   'NEIGHBOURHOOD_158': 'NSA',
   'HOOD_140': 'NSA',
   'NEIGHBOURHOOD_140': 'NSA',
   'LONG_WGS84': 0,
   'LAT_WGS84': 0}},
 {'type': 'Feature',
  'id': 246677,
  'geometry': {'type': 'Point', 'coordinates': [-79.460110312, 43.721012854]},
  'properties': {'OBJECTID': 246677,
   'EVENT_UNIQUE_ID': 'GO-20211123',
   'REPORT_DATE': 1609477200000,
   'OCC_DATE': 1609477200000,
   'REPORT_YEAR': 2021,
   'REPORT_MONTH': 'January',
   'REPORT_DAY': 1,
   'REPORT_DOY': 1,
   'REPORT_DOW': 'Friday    ',
   'REPORT_HOUR': 7,
   'OCC_YEAR': 2021,
   'OCC_MONTH': 'January',
   'OCC_DAY': 1,
   'OCC_DOY': 1,
   'OCC_DOW': 'Friday    ',
   'OCC_HOUR': 4,
   'DIVISION': 'D32',
   'LOCATION_TYPE': "Other Non Commercial / Corporate Places (Non-Profit, Gov'T, Firehall)",
   'PREMISES_TYPE': 'Other',
   'UCR_CODE': 2135,
   'UCR_EXT': 210,
   'OFFENCE': 'Theft Of Motor Vehicle',
   'MCI_CATEGORY': 'Auto Theft',
   'HOOD_158': '031',
   'NEIGHBOURHOOD_158': 'Yorkdale-Glen Park (31)',
   'HOOD_140': '031',
   'NEIGHBOURHOOD_140': 'Yorkdale-Glen Park (31)',
   'LONG_WGS84': -79.46011031171706,
   'LAT_WGS84': 43.72101285418029}}]

back to the top

Filtering out metadata from JSON array, and extracting the crime data itself:

[{'OBJECTID': 246675,
  'EVENT_UNIQUE_ID': 'GO-20213605',
  'REPORT_DATE': 1609477200000,
  'OCC_DATE': 1609477200000,
  'REPORT_YEAR': 2021,
  'REPORT_MONTH': 'January',
  'REPORT_DAY': 1,
  'REPORT_DOY': 1,
  'REPORT_DOW': 'Friday    ',
  'REPORT_HOUR': 16,
  'OCC_YEAR': 2021,
  'OCC_MONTH': 'January',
  'OCC_DAY': 1,
  'OCC_DOY': 1,
  'OCC_DOW': 'Friday    ',
  'OCC_HOUR': 16,
  'DIVISION': 'D13',
  'LOCATION_TYPE': 'Parking Lots (Apt., Commercial Or Non-Commercial)',
  'PREMISES_TYPE': 'Outside',
  'UCR_CODE': 2135,
  'UCR_EXT': 210,
  'OFFENCE': 'Theft Of Motor Vehicle',
  'MCI_CATEGORY': 'Auto Theft',
  'HOOD_158': '094',
  'NEIGHBOURHOOD_158': 'Wychwood (94)',
  'HOOD_140': '094',
  'NEIGHBOURHOOD_140': 'Wychwood (94)',
  'LONG_WGS84': -79.42576192637651,
  'LAT_WGS84': 43.68176901263976},
 {'OBJECTID': 246676,
  'EVENT_UNIQUE_ID': 'GO-20213400',
  'REPORT_DATE': 1609477200000,
  'OCC_DATE': 1609477200000,
  'REPORT_YEAR': 2021,
  'REPORT_MONTH': 'January',
  'REPORT_DAY': 1,
  'REPORT_DOY': 1,
  'REPORT_DOW': 'Friday    ',
  'REPORT_HOUR': 16,
  'OCC_YEAR': 2021,
  'OCC_MONTH': 'January',
  'OCC_DAY': 1,
  'OCC_DOY': 1,
  'OCC_DOW': 'Friday    ',
  'OCC_HOUR': 4,
  'DIVISION': 'D33',
  'LOCATION_TYPE': 'Other Commercial / Corporate Places (For Profit, Warehouse, Corp. Bldg',
  'PREMISES_TYPE': 'Commercial',
  'UCR_CODE': 2135,
  'UCR_EXT': 210,
  'OFFENCE': 'Theft Of Motor Vehicle',
  'MCI_CATEGORY': 'Auto Theft',
  'HOOD_158': 'NSA',
  'NEIGHBOURHOOD_158': 'NSA',
  'HOOD_140': 'NSA',
  'NEIGHBOURHOOD_140': 'NSA',
  'LONG_WGS84': 0,
  'LAT_WGS84': 0},
 {'OBJECTID': 246677,
  'EVENT_UNIQUE_ID': 'GO-20211123',
  'REPORT_DATE': 1609477200000,
  'OCC_DATE': 1609477200000,
  'REPORT_YEAR': 2021,
  'REPORT_MONTH': 'January',
  'REPORT_DAY': 1,
  'REPORT_DOY': 1,
  'REPORT_DOW': 'Friday    ',
  'REPORT_HOUR': 7,
  'OCC_YEAR': 2021,
  'OCC_MONTH': 'January',
  'OCC_DAY': 1,
  'OCC_DOY': 1,
  'OCC_DOW': 'Friday    ',
  'OCC_HOUR': 4,
  'DIVISION': 'D32',
  'LOCATION_TYPE': "Other Non Commercial / Corporate Places (Non-Profit, Gov'T, Firehall)",
  'PREMISES_TYPE': 'Other',
  'UCR_CODE': 2135,
  'UCR_EXT': 210,
  'OFFENCE': 'Theft Of Motor Vehicle',
  'MCI_CATEGORY': 'Auto Theft',
  'HOOD_158': '031',
  'NEIGHBOURHOOD_158': 'Yorkdale-Glen Park (31)',
  'HOOD_140': '031',
  'NEIGHBOURHOOD_140': 'Yorkdale-Glen Park (31)',
  'LONG_WGS84': -79.46011031171706,
  'LAT_WGS84': 43.72101285418029}]

back to the top

Converting JSON array into a Pandas DataFrame:

OBJECTID EVENT_UNIQUE_ID REPORT_DATE OCC_DATE REPORT_YEAR REPORT_MONTH REPORT_DAY REPORT_DOY REPORT_DOW REPORT_HOUR ... UCR_CODE UCR_EXT OFFENCE MCI_CATEGORY HOOD_158 NEIGHBOURHOOD_158 HOOD_140 NEIGHBOURHOOD_140 LONG_WGS84 LAT_WGS84
0 246675 GO-20213605 1609477200000 1609477200000 2021 January 1 1 Friday 16 ... 2135 210 Theft Of Motor Vehicle Auto Theft 094 Wychwood (94) 094 Wychwood (94) -79.425762 43.681769
1 246676 GO-20213400 1609477200000 1609477200000 2021 January 1 1 Friday 16 ... 2135 210 Theft Of Motor Vehicle Auto Theft NSA NSA NSA NSA 0.000000 0.000000
2 246677 GO-20211123 1609477200000 1609477200000 2021 January 1 1 Friday 7 ... 2135 210 Theft Of Motor Vehicle Auto Theft 031 Yorkdale-Glen Park (31) 031 Yorkdale-Glen Park (31) -79.460110 43.721013
3 246678 GO-2021445 1609477200000 1609477200000 2021 January 1 1 Friday 1 ... 2135 210 Theft Of Motor Vehicle Auto Theft 151 Yonge-Doris (151) 051 Willowdale East (51) -79.415293 43.778743
4 246679 GO-20213400 1609477200000 1609477200000 2021 January 1 1 Friday 16 ... 2135 210 Theft Of Motor Vehicle Auto Theft NSA NSA NSA NSA 0.000000 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
147546 396731 GO-20241427047 1719723600000 1719637200000 2024 June 30 182 Sunday 16 ... 1430 100 Assault Assault 071 Cabbagetown-South St.James Town (71) 071 Cabbagetown-South St.James Town (71) -79.373043 43.663195
147547 396732 GO-20241427869 1719723600000 1719723600000 2024 June 30 182 Sunday 18 ... 2133 200 Theft Over - Shoplifting Theft Over 027 York University Heights (27) 027 York University Heights (27) -79.464942 43.759469
147548 396733 GO-20241423116 1719723600000 1719637200000 2024 June 30 182 Sunday 2 ... 1450 120 Discharge Firearm With Intent Assault 144 Morningside Heights (144) 131 Rouge (131) -79.248477 43.837237
147549 396734 GO-20241426669 1719723600000 1718859600000 2024 June 30 182 Sunday 15 ... 2132 200 Theft From Motor Vehicle Over Theft Over 160 Mimico-Queensway (160) 017 Mimico (includes Humber Bay Shores) (17) -79.521053 43.616490
147550 396735 GO-20241425318 1719723600000 1719637200000 2024 June 30 182 Sunday 11 ... 1430 100 Assault Assault 018 New Toronto (18) 018 New Toronto (18) -79.513940 43.598831

147551 rows × 29 columns


Preprocess the Data¶

back to the top

Previewing all the columns(features) of the DataFrame:

Index(['OBJECTID', 'EVENT_UNIQUE_ID', 'REPORT_DATE', 'OCC_DATE', 'REPORT_YEAR',
       'REPORT_MONTH', 'REPORT_DAY', 'REPORT_DOY', 'REPORT_DOW', 'REPORT_HOUR',
       'OCC_YEAR', 'OCC_MONTH', 'OCC_DAY', 'OCC_DOY', 'OCC_DOW', 'OCC_HOUR',
       'DIVISION', 'LOCATION_TYPE', 'PREMISES_TYPE', 'UCR_CODE', 'UCR_EXT',
       'OFFENCE', 'MCI_CATEGORY', 'HOOD_158', 'NEIGHBOURHOOD_158', 'HOOD_140',
       'NEIGHBOURHOOD_140', 'LONG_WGS84', 'LAT_WGS84'],
      dtype='object')

Replacing the null NSA values in the Neighbourhood numerical code column with 0 for encoding purposes:

OBJECTID EVENT_UNIQUE_ID REPORT_DATE OCC_DATE REPORT_YEAR REPORT_MONTH REPORT_DAY REPORT_DOY REPORT_DOW REPORT_HOUR ... UCR_CODE UCR_EXT OFFENCE MCI_CATEGORY HOOD_158 NEIGHBOURHOOD_158 HOOD_140 NEIGHBOURHOOD_140 LONG_WGS84 LAT_WGS84
0 246675 GO-20213605 1609477200000 1609477200000 2021 January 1 1 Friday 16 ... 2135 210 Theft Of Motor Vehicle Auto Theft 94 Wychwood (94) 094 Wychwood (94) -79.425762 43.681769
1 246676 GO-20213400 1609477200000 1609477200000 2021 January 1 1 Friday 16 ... 2135 210 Theft Of Motor Vehicle Auto Theft 0 NSA NSA NSA 0.000000 0.000000
2 246677 GO-20211123 1609477200000 1609477200000 2021 January 1 1 Friday 7 ... 2135 210 Theft Of Motor Vehicle Auto Theft 31 Yorkdale-Glen Park (31) 031 Yorkdale-Glen Park (31) -79.460110 43.721013
3 246678 GO-2021445 1609477200000 1609477200000 2021 January 1 1 Friday 1 ... 2135 210 Theft Of Motor Vehicle Auto Theft 151 Yonge-Doris (151) 051 Willowdale East (51) -79.415293 43.778743
4 246679 GO-20213400 1609477200000 1609477200000 2021 January 1 1 Friday 16 ... 2135 210 Theft Of Motor Vehicle Auto Theft 0 NSA NSA NSA 0.000000 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
147546 396731 GO-20241427047 1719723600000 1719637200000 2024 June 30 182 Sunday 16 ... 1430 100 Assault Assault 71 Cabbagetown-South St.James Town (71) 071 Cabbagetown-South St.James Town (71) -79.373043 43.663195
147547 396732 GO-20241427869 1719723600000 1719723600000 2024 June 30 182 Sunday 18 ... 2133 200 Theft Over - Shoplifting Theft Over 27 York University Heights (27) 027 York University Heights (27) -79.464942 43.759469
147548 396733 GO-20241423116 1719723600000 1719637200000 2024 June 30 182 Sunday 2 ... 1450 120 Discharge Firearm With Intent Assault 144 Morningside Heights (144) 131 Rouge (131) -79.248477 43.837237
147549 396734 GO-20241426669 1719723600000 1718859600000 2024 June 30 182 Sunday 15 ... 2132 200 Theft From Motor Vehicle Over Theft Over 160 Mimico-Queensway (160) 017 Mimico (includes Humber Bay Shores) (17) -79.521053 43.616490
147550 396735 GO-20241425318 1719723600000 1719637200000 2024 June 30 182 Sunday 11 ... 1430 100 Assault Assault 18 New Toronto (18) 018 New Toronto (18) -79.513940 43.598831

147551 rows × 29 columns

back to the top

Previewing the occurence month column to verify that the month names are written in full/ not abbreviated:

OCC_MONTH
0 January
1 January
2 January
3 January
4 January
... ...
147546 June
147547 June
147548 June
147549 June
147550 June

147551 rows × 1 columns


back to the top

Encoding the month column converting the month names to their associated numbers:

OCC_MONTH
0 1
1 1
2 1
3 1
4 1
... ...
147546 6
147547 6
147548 6
147549 6
147550 6

147551 rows × 1 columns


back to the top

Collecting only the columns that are potential candidates for features in the machine learning model along with the target: MCI_Category.

EVENT_UNIQUE_ID NEIGHBOURHOOD_158 HOOD_158 LAT_WGS84 LONG_WGS84 PREMISES_TYPE OCC_DATE OCC_YEAR OCC_MONTH OCC_DAY OCC_HOUR MCI_CATEGORY
0 GO-20213605 Wychwood (94) 94 43.681769 -79.425762 Outside 1609477200000 2021 1 1 16 Auto Theft
1 GO-20213400 NSA 0 0.000000 0.000000 Commercial 1609477200000 2021 1 1 4 Auto Theft
2 GO-20211123 Yorkdale-Glen Park (31) 31 43.721013 -79.460110 Other 1609477200000 2021 1 1 4 Auto Theft
3 GO-2021445 Yonge-Doris (151) 151 43.778743 -79.415293 Other 1609477200000 2021 1 1 1 Auto Theft
4 GO-20213400 NSA 0 0.000000 0.000000 Commercial 1609477200000 2021 1 1 4 Auto Theft
... ... ... ... ... ... ... ... ... ... ... ... ...
147546 GO-20241427047 Cabbagetown-South St.James Town (71) 71 43.663195 -79.373043 Apartment 1719637200000 2024 6 29 23 Assault
147547 GO-20241427869 York University Heights (27) 27 43.759469 -79.464942 Commercial 1719723600000 2024 6 30 18 Theft Over
147548 GO-20241423116 Morningside Heights (144) 144 43.837237 -79.248477 Outside 1719637200000 2024 6 29 21 Assault
147549 GO-20241426669 Mimico-Queensway (160) 160 43.616490 -79.521053 Outside 1718859600000 2024 6 20 13 Theft Over
147550 GO-20241425318 New Toronto (18) 18 43.598831 -79.513940 House 1719637200000 2024 6 29 20 Assault

147551 rows × 12 columns

back to the top

One-Hot Encoding Target Variable:

Assault Auto Theft Break and Enter Robbery Theft Over
0 0 1 0 0 0
1 0 1 0 0 0
2 0 1 0 0 0
3 0 1 0 0 0
4 0 1 0 0 0
... ... ... ... ... ...
147546 1 0 0 0 0
147547 0 0 0 0 1
147548 1 0 0 0 0
147549 0 0 0 0 1
147550 1 0 0 0 0

147551 rows × 5 columns

back to the top

Dropping the unencoded target column:

EVENT_UNIQUE_ID NEIGHBOURHOOD_158 HOOD_158 LAT_WGS84 LONG_WGS84 PREMISES_TYPE OCC_DATE OCC_YEAR OCC_MONTH OCC_DAY OCC_HOUR Assault Auto Theft Break and Enter Robbery Theft Over
0 GO-20213605 Wychwood (94) 94 43.681769 -79.425762 Outside 1609477200000 2021 1 1 16 0 1 0 0 0
1 GO-20213400 NSA 0 0.000000 0.000000 Commercial 1609477200000 2021 1 1 4 0 1 0 0 0
2 GO-20211123 Yorkdale-Glen Park (31) 31 43.721013 -79.460110 Other 1609477200000 2021 1 1 4 0 1 0 0 0
3 GO-2021445 Yonge-Doris (151) 151 43.778743 -79.415293 Other 1609477200000 2021 1 1 1 0 1 0 0 0
4 GO-20213400 NSA 0 0.000000 0.000000 Commercial 1609477200000 2021 1 1 4 0 1 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
147546 GO-20241427047 Cabbagetown-South St.James Town (71) 71 43.663195 -79.373043 Apartment 1719637200000 2024 6 29 23 1 0 0 0 0
147547 GO-20241427869 York University Heights (27) 27 43.759469 -79.464942 Commercial 1719723600000 2024 6 30 18 0 0 0 0 1
147548 GO-20241423116 Morningside Heights (144) 144 43.837237 -79.248477 Outside 1719637200000 2024 6 29 21 1 0 0 0 0
147549 GO-20241426669 Mimico-Queensway (160) 160 43.616490 -79.521053 Outside 1718859600000 2024 6 20 13 0 0 0 0 1
147550 GO-20241425318 New Toronto (18) 18 43.598831 -79.513940 House 1719637200000 2024 6 29 20 1 0 0 0 0

147551 rows × 16 columns

back to the top

NEIGHBOURHOOD_158 HOOD_158 LAT_WGS84 LONG_WGS84 PREMISES_TYPE OCC_DATE OCC_YEAR OCC_MONTH OCC_DAY OCC_HOUR
EVENT_UNIQUE_ID
GO-20211000033 West Queen West (162) 162 43.646286 -79.408568 Commercial 1622264400000 2021 5 29 21
GO-2021100004 Morningside Heights (144) 144 43.807252 -79.162903 Outside 1610773200000 2021 1 16 17
GO-20211000054 Moss Park (73) 73 43.657067 -79.374531 Apartment 1622264400000 2021 5 29 22
GO-20211000193 Fort York-Liberty Village (163) 163 43.636618 -79.399704 Apartment 1622264400000 2021 5 29 23
GO-20211000248 Eglinton East (138) 138 43.737099 -79.246230 Outside 1622264400000 2021 5 29 21
... ... ... ... ... ... ... ... ... ... ...
GO-20249997 Junction-Wallace Emerson (171) 171 43.668917 -79.442637 Outside 1704085200000 2024 1 1 18
GO-202499972 Edenbridge-Humber Valley (9) 9 43.672705 -79.522472 House 1705208400000 2024 1 14 3
GO-2024999786 Flemingdon Park (44) 44 43.718727 -79.334948 Apartment 1714539600000 2024 5 1 0
GO-2024999795 Oakridge (121) 121 43.691225 -79.288346 Commercial 1715230800000 2024 5 9 13
GO-2024999882 Eglinton East (138) 138 43.738856 -79.238421 Commercial 1715230800000 2024 5 9 14

129217 rows × 10 columns

back to the top

Assault Auto Theft Break and Enter Robbery Theft Over
EVENT_UNIQUE_ID
GO-20211000033 0 0 1 0 0
GO-2021100004 0 1 0 0 0
GO-20211000054 1 0 0 0 0
GO-20211000193 1 0 0 0 0
GO-20211000248 1 0 0 0 0
... ... ... ... ... ...
GO-20249997 0 1 0 0 0
GO-202499972 0 1 0 0 0
GO-2024999786 1 0 0 0 0
GO-2024999795 1 0 0 0 0
GO-2024999882 1 0 0 0 0

129217 rows × 5 columns

back to the top

EVENT_UNIQUE_ID NEIGHBOURHOOD_158 HOOD_158 LAT_WGS84 LONG_WGS84 PREMISES_TYPE OCC_DATE OCC_YEAR OCC_MONTH OCC_DAY OCC_HOUR Assault Auto Theft Break and Enter Robbery Theft Over
0 GO-20211000033 West Queen West (162) 162 43.646286 -79.408568 Commercial 1622264400000 2021 5 29 21 0 0 1 0 0
1 GO-2021100004 Morningside Heights (144) 144 43.807252 -79.162903 Outside 1610773200000 2021 1 16 17 0 1 0 0 0
2 GO-20211000054 Moss Park (73) 73 43.657067 -79.374531 Apartment 1622264400000 2021 5 29 22 1 0 0 0 0
3 GO-20211000193 Fort York-Liberty Village (163) 163 43.636618 -79.399704 Apartment 1622264400000 2021 5 29 23 1 0 0 0 0
4 GO-20211000248 Eglinton East (138) 138 43.737099 -79.246230 Outside 1622264400000 2021 5 29 21 1 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
129212 GO-20249997 Junction-Wallace Emerson (171) 171 43.668917 -79.442637 Outside 1704085200000 2024 1 1 18 0 1 0 0 0
129213 GO-202499972 Edenbridge-Humber Valley (9) 9 43.672705 -79.522472 House 1705208400000 2024 1 14 3 0 1 0 0 0
129214 GO-2024999786 Flemingdon Park (44) 44 43.718727 -79.334948 Apartment 1714539600000 2024 5 1 0 1 0 0 0 0
129215 GO-2024999795 Oakridge (121) 121 43.691225 -79.288346 Commercial 1715230800000 2024 5 9 13 1 0 0 0 0
129216 GO-2024999882 Eglinton East (138) 138 43.738856 -79.238421 Commercial 1715230800000 2024 5 9 14 1 0 0 0 0

129217 rows × 16 columns


Create the Model¶

Creating the Testing and Training Datasets - First Approach¶

back to the top

EVENT_UNIQUE_ID NEIGHBOURHOOD_158 HOOD_158 LAT_WGS84 LONG_WGS84 PREMISES_TYPE OCC_DATE OCC_YEAR OCC_MONTH OCC_DAY OCC_HOUR Assault Auto Theft Break and Enter Robbery Theft Over Total_Count
0 GO-20211000033 West Queen West (162) 162 43.646286 -79.408568 Commercial 1622264400000 2021 5 29 21 0 0 1 0 0 1
1 GO-2021100004 Morningside Heights (144) 144 43.807252 -79.162903 Outside 1610773200000 2021 1 16 17 0 1 0 0 0 1
2 GO-20211000054 Moss Park (73) 73 43.657067 -79.374531 Apartment 1622264400000 2021 5 29 22 1 0 0 0 0 1
3 GO-20211000193 Fort York-Liberty Village (163) 163 43.636618 -79.399704 Apartment 1622264400000 2021 5 29 23 1 0 0 0 0 1
4 GO-20211000248 Eglinton East (138) 138 43.737099 -79.246230 Outside 1622264400000 2021 5 29 21 1 0 0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
129212 GO-20249997 Junction-Wallace Emerson (171) 171 43.668917 -79.442637 Outside 1704085200000 2024 1 1 18 0 1 0 0 0 1
129213 GO-202499972 Edenbridge-Humber Valley (9) 9 43.672705 -79.522472 House 1705208400000 2024 1 14 3 0 1 0 0 0 1
129214 GO-2024999786 Flemingdon Park (44) 44 43.718727 -79.334948 Apartment 1714539600000 2024 5 1 0 1 0 0 0 0 1
129215 GO-2024999795 Oakridge (121) 121 43.691225 -79.288346 Commercial 1715230800000 2024 5 9 13 1 0 0 0 0 1
129216 GO-2024999882 Eglinton East (138) 138 43.738856 -79.238421 Commercial 1715230800000 2024 5 9 14 1 0 0 0 0 1

129217 rows × 17 columns

back to the top

HOOD_158 OCC_YEAR OCC_MONTH Total_Count
0 0 2021 1 44
1 0 2021 2 34
2 0 2021 3 45
3 0 2021 4 26
4 0 2021 5 37
... ... ... ... ...
6672 174 2024 2 15
6673 174 2024 3 7
6674 174 2024 4 17
6675 174 2024 5 12
6676 174 2024 6 13

6677 rows × 4 columns

back to the top

HOOD_158 OCC_YEAR OCC_MONTH Assault Auto Theft Break and Enter Robbery Theft Over Total_Count
1895 49 2021 6 1 0 1 0 0 1
2229 58 2021 4 0 0 1 0 0 1
1909 49 2022 8 0 1 0 0 0 1
5208 140 2021 2 1 0 0 0 0 1
1890 49 2021 1 0 0 1 0 0 1
... ... ... ... ... ... ... ... ... ...
77 1 2023 12 30 45 28 4 13 118
68 1 2023 3 18 83 12 3 3 119
65 1 2022 12 24 67 15 4 10 120
71 1 2023 6 24 77 22 4 5 131
66 1 2023 1 18 91 14 3 7 133

6677 rows × 9 columns

back to the top

HOOD_158 OCC_YEAR OCC_MONTH Assault Auto Theft Break and Enter Robbery Theft Over Total_Count
0 0 2021 1 23 2 10 9 1 44
1 0 2021 2 24 1 8 1 0 34
2 0 2021 3 27 7 5 5 3 45
3 0 2021 4 16 1 3 3 3 26
4 0 2021 5 30 3 3 1 1 37
5 0 2021 6 22 3 3 2 3 32
6 0 2021 7 29 2 3 6 2 42
7 0 2021 8 48 6 2 7 3 66
8 0 2021 9 27 8 6 3 3 47
9 0 2021 10 39 9 15 2 2 65
10 0 2021 11 26 10 6 4 4 50
11 0 2021 12 22 5 8 2 1 37
12 0 2022 1 32 7 6 3 2 50
13 0 2022 2 30 10 3 1 1 45
14 0 2022 3 34 1 6 7 3 50
15 0 2022 4 28 8 3 11 3 52
16 0 2022 5 27 13 6 2 10 56
17 0 2022 6 25 7 4 4 2 41
18 0 2022 7 27 10 6 6 2 51
19 0 2022 8 28 6 6 6 6 52
20 0 2022 9 36 19 7 12 1 74
21 0 2022 10 35 10 5 7 2 58
22 0 2022 11 37 12 5 6 3 62
23 0 2022 12 25 12 2 4 1 43
24 0 2023 1 24 9 2 4 4 43
25 0 2023 2 17 12 1 5 2 37
26 0 2023 3 20 14 1 1 1 37
27 0 2023 4 17 12 1 2 1 33
28 0 2023 5 22 6 0 4 1 32
29 0 2023 6 13 13 0 0 1 27
30 0 2023 7 27 12 1 3 0 41
31 0 2023 8 15 11 1 1 1 29
32 0 2023 9 15 16 1 2 5 39
33 0 2023 10 20 14 0 1 1 36
34 0 2023 11 25 10 3 2 1 40
35 0 2023 12 24 5 0 6 0 34
36 0 2024 1 21 4 2 4 2 33
37 0 2024 2 17 4 0 4 1 26
38 0 2024 3 19 7 4 1 3 33
39 0 2024 4 19 5 1 1 0 26
40 0 2024 5 17 12 4 2 2 36
41 0 2024 6 20 7 0 4 1 31

back to the top

HOOD_158 OCC_YEAR OCC_MONTH Assault Auto Theft Break and Enter Robbery Theft Over Total_Count
0 1 2021 1 18 35 7 1 3 62
1 1 2021 2 17 17 5 1 3 43
2 1 2021 3 15 20 8 6 6 54
3 1 2021 4 11 31 4 2 4 52
4 1 2021 5 18 26 9 5 4 62
... ... ... ... ... ... ... ... ... ...
6630 174 2024 2 9 0 5 1 1 15
6631 174 2024 3 6 1 0 0 0 7
6632 174 2024 4 12 2 2 0 1 17
6633 174 2024 5 8 1 2 0 1 12
6634 174 2024 6 6 4 1 1 1 13

6635 rows × 9 columns

Testing Different Models - First Approach¶

back to the top

RandomForestRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
MultiOutputRegressor(estimator=HistGradientBoostingRegressor())
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MultiOutputRegressor(estimator=HistGradientBoostingRegressor())
HistGradientBoostingRegressor()
HistGradientBoostingRegressor()
Lasso()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Lasso()
ExtraTreesRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ExtraTreesRegressor()

back to the top

KNeighborsRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KNeighborsRegressor()
ElasticNet()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ElasticNet()
RadiusNeighborsRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RadiusNeighborsRegressor()

back to the top

/usr/local/lib/python3.10/dist-packages/numpy/core/numeric.py:407: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
Random Forests Results
Mean Squared Error: 16.345516578762318
R-squared: 0.42185755406596
mean absolute error: 2.507420886075949
Histogram-Based Gradient Boosting Results
Mean Squared Error: 13.53697402119441
R-squared: 0.5401264890283979
mean absolute error: 2.3624557106149555
Lasso Regressor Results
Mean Squared Error: 56.06504159690181
R-squared: 0.0008593089048466821
mean absolute error: 4.053649104634542
Extra-Trees Regressor Results
Mean Squared Error: 21.009235548523208
R-squared: 0.2633688550325657
mean absolute error: 2.8370112517580868

back to the top

K-Nearest Neighbors Regressor Results
Mean Squared Error: 22.812032348804497
R-squared: 0.3917381351580209
mean absolute error: 2.9212025316455694
Elastic Net Regressor Results
Mean Squared Error: 55.83116626511688
R-squared: 0.0018104521500861837
mean absolute error: 4.0478073837331445
Radius Neighbors Regressor Results
Mean Squared Error: 8.973691110784241e+34
R-squared: -1.48166511902087e+34
mean absolute error: 9729295397526136.0

Optimizing the models - First Approach¶

Perform Hyper-Parameter Tuning on the Models¶

Hyper-Parameter Tuning was performed on the following models, since they yielded the highest R-squared scores:

  • Random Forest Regressor
  • Histogram-Based Gradient Boosting
  • K-Nearest Neighbors Regressor

Iterations for Random Forest (RF) Regressor¶

back to the top

Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.928958393105038
R-squared: 0.49663590673300156
mean absolute error: 2.6323390583657864
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 10, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.928958393105038
R-squared: 0.49663590673300156
mean absolute error: 2.6323390583657864
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.900895599495165
R-squared: 0.49718000005206253
mean absolute error: 2.6397685170607654
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 10, 'min_samples_leaf': 4, 'max_depth': 5}
Mean Squared Error: 37.38108382013141
R-squared: 0.24787742552140726
mean absolute error: 3.5582910596792368
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.900895599495165
R-squared: 0.49718000005206253
mean absolute error: 2.6397685170607654
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.900895599495165
R-squared: 0.49718000005206253
mean absolute error: 2.6397685170607654
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.900895599495165
R-squared: 0.49718000005206253
mean absolute error: 2.6397685170607654
Best Hyperparameters: {'n_estimators': 300, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 18.358640232112
R-squared: 0.49228192721420205
mean absolute error: 2.66464858496764
Best Hyperparameters: {'n_estimators': 250, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 18.348343979326263
R-squared: 0.49257004515939246
mean absolute error: 2.6625904431614718
Best Hyperparameters: {'n_estimators': 450, 'min_samples_split': 15, 'min_samples_leaf': 8, 'max_depth': 10}
Mean Squared Error: 18.777789215683814
R-squared: 0.485768406447904
mean absolute error: 2.6973421196246625
Best Hyperparameters: {'n_estimators': 400, 'min_samples_split': 55, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 23.92497566088106
R-squared: 0.4239082506998882
mean absolute error: 2.9652493531921866
Best Hyperparameters: {'n_estimators': 100, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Mean Squared Error: 17.900895599495165
R-squared: 0.49718000005206253
mean absolute error: 2.6397685170607654

back to the top

Iterations for Histogram-Based Gradient Boosting (HBGB)¶
Best Hyperparameters: {'estimator__max_depth': 7, 'estimator__learning_rate': 0.01, 'estimator__l2_regularization': 0.2}
Mean Squared Error: 32.18282924168508
R-squared: 0.3320401942652549
mean absolute error: 3.2649741520630413
Best Hyperparameters: {'estimator__min_samples_leaf': 40, 'estimator__max_depth': None, 'estimator__learning_rate': 0.01, 'estimator__l2_regularization': 0.0}
Mean Squared Error: 34.768348922898916
R-squared: 0.28056428760433166
mean absolute error: 3.322235227775795
Best Hyperparameters: {'estimator__min_samples_leaf': 10, 'estimator__max_depth': None, 'estimator__learning_rate': 0.01, 'estimator__l2_regularization': 0.1}
Mean Squared Error: 27.707811341396553
R-squared: 0.39456052080811127
mean absolute error: 3.123993487397762

back to the top

Iterations for K-Nearest Neighbors (KNN) Regressor¶
Best Hyperparameters: {'weights': 'distance', 'p': 1, 'n_neighbors': 9}
Mean Squared Error: 18.85456904070352
R-squared: 0.46409840345829956
mean absolute error: 2.643735813566883
/usr/local/lib/python3.10/dist-packages/sklearn/model_selection/_search.py:320: UserWarning: The total space of parameters 16 is smaller than n_iter=20. Running 16 iterations. For exhaustive searches, use GridSearchCV.
  warnings.warn(
Best Hyperparameters: {'weights': 'distance', 'p': 1, 'n_neighbors': 9}
Mean Squared Error: 18.85456904070352
R-squared: 0.46409840345829956
mean absolute error: 2.643735813566883
Best Hyperparameters: {'weights': 'distance', 'p': 1, 'n_neighbors': 11}
Mean Squared Error: 19.612298984089254
R-squared: 0.46018526487952816
mean absolute error: 2.677782519371975

back to the top

Apply Boosting Algorithms¶
MultiOutputRegressor(estimator=AdaBoostRegressor(estimator=HistGradientBoostingRegressor(random_state=1),
                                                 random_state=1))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
MultiOutputRegressor(estimator=AdaBoostRegressor(estimator=HistGradientBoostingRegressor(random_state=1),
                                                 random_state=1))
AdaBoostRegressor(estimator=HistGradientBoostingRegressor(random_state=1),
                  random_state=1)
HistGradientBoostingRegressor(random_state=1)
HistGradientBoostingRegressor(random_state=1)
AdaBoostRegressor Mean Squared Error: 15.514865328785016
AdaBoostRegressor R-squared: 0.4603984456235
AdaBoostRegressor Mean Absolute Error: 2.5524177817460036
RegressorChain(base_estimator=RandomForestRegressor(max_depth=10,
                                                    min_samples_leaf=2,
                                                    min_samples_split=15,
                                                    random_state=1))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RegressorChain(base_estimator=RandomForestRegressor(max_depth=10,
                                                    min_samples_leaf=2,
                                                    min_samples_split=15,
                                                    random_state=1))
RandomForestRegressor(max_depth=10, min_samples_leaf=2, min_samples_split=15,
                      random_state=1)
RandomForestRegressor(max_depth=10, min_samples_leaf=2, min_samples_split=15,
                      random_state=1)
Regression Chain Model Mean Squared Error: 14.907590528079728
Regression Chain Model R-squared: 0.5211560076522394
Regression Chain Model Mean Absolute Error: 2.4530789200162197

back to the top


Creating the Testing and Training Datasets - Second Approach¶

Testing Different Models - Second Approach¶

Predicting Total Counts¶

back to the top

/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Lasso()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Lasso()
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return fit_method(estimator, *args, **kwargs)
ExtraTreesRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ExtraTreesRegressor()
KNeighborsRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KNeighborsRegressor()
ElasticNet()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ElasticNet()
RadiusNeighborsRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RadiusNeighborsRegressor()
/usr/local/lib/python3.10/dist-packages/numpy/core/numeric.py:407: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
Random Forest Results
Mean Squared Error: 47.507770147679324
R-squared: 0.7805812475805141
mean absolute error: 5.139229957805908
Histogram-Based Gradient Boosting Results
Mean Squared Error: 42.3805269624916
R-squared: 0.8042618644469353
mean absolute error: 5.0338833270972545
Lasso Results
Mean Squared Error: 215.0452817500199
R-squared: 0.006794735079959646
mean absolute error: 10.677136243599007
Extra-Trees Results
Mean Squared Error: 61.829009704641344
R-squared: 0.7144373619188399
mean absolute error: 5.8675316455696205
K-Nearest Neighbors Results
Mean Squared Error: 78.54392405063292
R-squared: 0.6372380818601189
mean absolute error: 6.74535864978903
Elastic Net Results
Mean Squared Error: 214.33255721979504
R-squared: 0.010086515072044833
mean absolute error: 10.465525613654595
Radius Neighbors Results
Mean Squared Error: 8.973691110784241e+34
R-squared: -4.144576986049706e+32
mean absolute error: 9729295397526140.0

Predicting Assault Counts¶

back to the top

/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Random Forests Results
Mean Squared Error: 18.884251160337556
R-squared: 0.7621670878841997
mean absolute error: 3.122795358649789
Histogram-Based Gradient Boosting Results
Mean Squared Error: 16.26862998275755
R-squared: 0.7951088654730386
mean absolute error: 3.003620298129256

Predicting Auto Theft Counts¶

back to the top

/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Random Forests Results
Mean Squared Error: 18.344488924050633
R-squared: 0.31502817625221924
mean absolute error: 2.793428270042194
Histogram-Based Gradient Boosting Results
Mean Squared Error: 12.029894217961466
R-squared: 0.5508112209565743
mean absolute error: 2.4878698854824157

Predicting Break and Enter Counts¶

back to the top

/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Random Forests Results
Mean Squared Error: 8.362676476793249
R-squared: 0.28879841796872596
mean absolute error: 2.052943037974684
Histogram-Based Gradient Boosting Results
Mean Squared Error: 7.185335567450399
R-squared: 0.38892506039455554
mean absolute error: 1.9032188779372148

Predicting Robbery Counts¶

back to the top

/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Random Forests Results
Mean Squared Error: 2.590130696202532
R-squared: 0.21079584323330303
mean absolute error: 1.1071835443037976
Histogram-Based Gradient Boosting Results
Mean Squared Error: 2.218185363946034
R-squared: 0.3241263414730897
mean absolute error: 0.9848720183410007

Predicting Theft Over Counts¶

back to the top

/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return fit_method(estimator, *args, **kwargs)
RandomForestRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor()
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
HistGradientBoostingRegressor()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
HistGradientBoostingRegressor()
Random Forests Results
Mean Squared Error: 1.4372389240506327
R-squared: 0.21472270183775177
mean absolute error: 0.8258755274261604
Histogram-Based Gradient Boosting Results
Mean Squared Error: 1.139272032559403
R-squared: 0.3775255814261935
mean absolute error: 0.7612698567025898

Optimizing the models - Second Approach¶

Auto Theft Models¶

Apply Regression Chain Boosting Algorithm on RF Regressor¶

back to the top

Regression Chain Random Forest Regressor (Auto Theft) Mean Squared Error: 18.3886164556962
Regression Chain Random Forest Regressor (Auto Theft) R-squared: 0.31338048162557164
Regression Chain Random Forest Regressor (Auto Theft) Mean Absolute Error: 2.798417721518987

Perform Hyper-Parameter Tuning on RF Regression Chain¶

back to the top

Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters: {'base_estimator__n_estimators': 50, 'base_estimator__min_samples_split': 10, 'base_estimator__min_samples_leaf': 1, 'base_estimator__max_depth': 5}
Regression Chain Random Forest Regressor (Auto Theft) Mean Squared Error: 15.601535081978382
Regression Chain Random Forest Regressor (Auto Theft) R-squared: 0.4174483692289197
Regression Chain Random Forest Regressor (Auto Theft) Mean Absolute Error: 2.924723449937228

Apply ADA Boosting Algorithm on HBGB¶

back to the top

/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
AdaBoost Mean Squared Error: 18.659485596532548
AdaBoost R-squared: 0.30326639612753314
AdaBoost mean absolute error: 2.8809512187634048

Perform Hyper-Parameter Tuning on ADA Boosted HBGB Regressor¶

Attempts were made to perform hyper-parameter tuning on ADA Boosted HBGB Regressor, but it took way too much time to run them and they did not yield siginficant R-Squared scores.

Total Count Models¶

Apply Regression Chain Boosting Algorithm on RF Regressor¶

back to the top

Regression Chain Random Forest Regressor (Total Count) Mean Squared Error: 48.1800003164557
Regression Chain Random Forest Regressor (Total Count) R-squared: 0.7774764943051415
Regression Chain Random Forest Regressor (Total Count) Mean Absolute Error: 5.157500000000001

Perform Hyper-Parameter Tuning on RF Regression Chain¶
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters: {'base_estimator__n_estimators': 100, 'base_estimator__min_samples_split': 10, 'base_estimator__min_samples_leaf': 1, 'base_estimator__max_depth': 10}
Regression Chain Random Forest Regressor (Total Count) Mean Squared Error: 66.41763382339101
Regression Chain Random Forest Regressor (Total Count) R-squared: 0.6932444038758028
Regression Chain Random Forest Regressor (Total Count) Mean Absolute Error: 6.172144837821778
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Parameters: {'base_estimator__n_estimators': 100, 'base_estimator__min_samples_split': 10, 'base_estimator__min_samples_leaf': 1, 'base_estimator__max_depth': 10}
Regression Chain Random Forest Regressor (Total Count) Mean Squared Error: 68.88399843268999
Regression Chain Random Forest Regressor (Total Count) R-squared: 0.6818532852461192
Regression Chain Random Forest Regressor (Total Count) Mean Absolute Error: 6.253101443805737
Fitting 5 folds for each of 81 candidates, totalling 405 fits
Best Parameters: {'base_estimator__n_estimators': 150, 'base_estimator__min_samples_split': 5, 'base_estimator__min_samples_leaf': 1, 'base_estimator__max_depth': 10}
Regression Chain Random Forest Regressor (Total Count) Mean Squared Error: 71.02147143653406
Regression Chain Random Forest Regressor (Total Count) R-squared: 0.6719811809908387
Regression Chain Random Forest Regressor (Total Count) Mean Absolute Error: 6.345448931345114
Fitting 5 folds for each of 81 candidates, totalling 405 fits
Best Parameters: {'base_estimator__n_estimators': 100, 'base_estimator__min_samples_split': 2, 'base_estimator__min_samples_leaf': 1, 'base_estimator__max_depth': 10}
Regression Chain Random Forest Regressor (Total Count) Mean Squared Error: 69.62343491331661
Regression Chain Random Forest Regressor (Total Count) R-squared: 0.6784381337968257
Regression Chain Random Forest Regressor (Total Count) Mean Absolute Error: 6.2914368593362715

Apply ADA Boosting Algorithm on HBGB¶

back to the top

/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
AdaBoost Mean Squared Error: 45.44923474580936
AdaBoost R-squared: 0.7900887716820576
AdaBoost mean absolute error: 5.161318291385233

Perform Hyper-Parameter Tuning on ADA Boosted HBGB Regressor¶

Attempts were made to perform hyper-parameter tuning on ADA Boosted HBGB Regressor, but it took way too much time to run them and they did not yield siginficant R-Squared scores.

Perform Hyper-Parameter Tuning on RF Regressor¶

back to the top

Fitting 5 folds for each of 20 candidates, totalling 100 fits
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return fit_method(estimator, *args, **kwargs)
Best Parameters: {'n_estimators': 150, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_depth': 10}
Random Forest Regressor (Total Count) Mean Squared Error: 71.17906332274143
Random Forest Regressor (Total Count) R-squared: 0.6712533292108969
Random Forest Regressor (Total Count) Mean Absolute Error: 6.347966311436887
Fitting 5 folds for each of 20 candidates, totalling 100 fits
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return fit_method(estimator, *args, **kwargs)
Best Parameters: {'n_estimators': 50, 'min_samples_leaf': 2, 'max_depth': 10}
Random Forest Regressor (Total Count) Mean Squared Error: 68.1934671493194
Random Forest Regressor (Total Count) R-squared: 0.6850425638048225
Random Forest Regressor (Total Count) Mean Absolute Error: 6.233294034033055
Fitting 5 folds for each of 20 candidates, totalling 100 fits
/usr/local/lib/python3.10/dist-packages/sklearn/base.py:1473: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
  return fit_method(estimator, *args, **kwargs)
Best Parameters: {'n_estimators': 50, 'min_samples_split': 15, 'min_samples_leaf': 2, 'max_depth': 10}
Random Forest Regressor (Total Count) Mean Squared Error: 70.81104307153397
Random Forest Regressor (Total Count) R-squared: 0.6729530626257474
Random Forest Regressor (Total Count) Mean Absolute Error: 6.383808010790825

Perform Hyper-Parameter Tuning on HBGB Regressor¶

back to the top

Fitting 5 folds for each of 20 candidates, totalling 100 fits
/usr/local/lib/python3.10/dist-packages/sklearn/utils/validation.py:1339: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
Best Parameters: {'min_samples_leaf': 10, 'max_iter': 300, 'max_depth': 10, 'learning_rate': 0.01, 'l2_regularization': 0.0}
HistGradientBoostingRegressor (Total Count) Mean Squared Error: 68.39689337906064
HistGradientBoostingRegressor (Total Count) R-squared: 0.68410302213826
HistGradientBoostingRegressor (Total Count) Mean Absolute Error: 6.424647018374788

Create a Voting Ensemble Learning Model with the default RF and HBGB Models¶

back to the top

Voting Regressor with fitted RF and HBGB

/usr/local/lib/python3.10/dist-packages/sklearn/ensemble/_voting.py:694: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
Voting Regressor (Total Count) Mean Squared Error: 41.42631600357465
Voting Regressor (Total Count) R-squared: 0.8086689704319083
Voting Regressor (Total Count) Mean Absolute Error: 4.853770088554295

Voting Regressor with unfitted RF and HBGB

/usr/local/lib/python3.10/dist-packages/sklearn/ensemble/_voting.py:694: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
Voting Regressor (Total Count) Mean Squared Error: 41.182782954814
Voting Regressor (Total Count) R-squared: 0.8097937489168987
Voting Regressor (Total Count) Mean Absolute Error: 4.8440160089608995

back to the top

Final Model¶

Hyper-Parameter Tuned Voting Regressor with unfitted RF and HBGB Regressors

Fitting 5 folds for each of 3 candidates, totalling 15 fits
/usr/local/lib/python3.10/dist-packages/sklearn/ensemble/_voting.py:694: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
Best Parameters for Voting Regressor: {'n_jobs': -1, 'weights': [1, 2]}
Voting Regressor (Total Count) Mean Squared Error: 40.66584193188937
Voting Regressor (Total Count) R-squared: 0.8121812858181674
Voting Regressor (Total Count) Mean Absolute Error: 4.862563006179832

back to the top

NEIGHBOURHOOD_158 HOOD_158
0 West Queen West (162) 162
1 Morningside Heights (144) 144
2 Moss Park (73) 73
3 Fort York-Liberty Village (163) 163
4 Eglinton East (138) 138
... ... ...
1091 Broadview North (57) 57
1337 Guildwood (140) 140
1369 Lambton Baby Point (114) 114
1412 Bayview Woods-Steeles (49) 49
1778 Woodbine-Lumsden (60) 60

159 rows × 2 columns

HOOD_158 OCC_YEAR OCC_MONTH Assault Auto Theft Break and Enter Robbery Theft Over Total_Count NEIGHBOURHOOD_158
0 1 2021 1 18 35 7 1 3 62 West Humber-Clairville (1)
1 1 2021 2 17 17 5 1 3 43 West Humber-Clairville (1)
2 1 2021 3 15 20 8 6 6 54 West Humber-Clairville (1)
3 1 2021 4 11 31 4 2 4 52 West Humber-Clairville (1)
4 1 2021 5 18 26 9 5 4 62 West Humber-Clairville (1)
... ... ... ... ... ... ... ... ... ... ...
6630 174 2024 2 9 0 5 1 1 15 South Eglinton-Davisville (174)
6631 174 2024 3 6 1 0 0 0 7 South Eglinton-Davisville (174)
6632 174 2024 4 12 2 2 0 1 17 South Eglinton-Davisville (174)
6633 174 2024 5 8 1 2 0 1 12 South Eglinton-Davisville (174)
6634 174 2024 6 6 4 1 1 1 13 South Eglinton-Davisville (174)

6635 rows × 10 columns

NEIGHBOURHOOD_158 HOOD_158 OCC_YEAR OCC_MONTH Assault Auto Theft Break and Enter Robbery Theft Over Total_Count
0 West Humber-Clairville (1) 1 2021 1 18 35 7 1 3 62
1 West Humber-Clairville (1) 1 2021 2 17 17 5 1 3 43
2 West Humber-Clairville (1) 1 2021 3 15 20 8 6 6 54
3 West Humber-Clairville (1) 1 2021 4 11 31 4 2 4 52
4 West Humber-Clairville (1) 1 2021 5 18 26 9 5 4 62
... ... ... ... ... ... ... ... ... ... ...
6630 South Eglinton-Davisville (174) 174 2024 2 9 0 5 1 1 15
6631 South Eglinton-Davisville (174) 174 2024 3 6 1 0 0 0 7
6632 South Eglinton-Davisville (174) 174 2024 4 12 2 2 0 1 17
6633 South Eglinton-Davisville (174) 174 2024 5 8 1 2 0 1 12
6634 South Eglinton-Davisville (174) 174 2024 6 6 4 1 1 1 13

6635 rows × 10 columns


Results¶

Visualizations based on current data¶

Total count of Criminal Incidents for past three years¶

back to the top

No description has been provided for this image

back to the top

No description has been provided for this image

Crime Statistics for Previous Years with a Month Breakdown¶

back to the top

No description has been provided for this image
No description has been provided for this image

Crime Statistics for 2021¶

back to the top

No description has been provided for this image
NEIGHBOURHOOD_158 OCC_YEAR Total_Count
139 West Humber-Clairville (1) 2021 849
93 Moss Park (73) 2021 778
36 Downtown Yonge East (168) 2021 774
156 York University Heights (27) 2021 562
125 St Lawrence-East Bayfront-The Islands 2021 527
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

back to the top

NEIGHBOURHOOD_158 OCC_YEAR Total_Count
78 Lambton Baby Point (114) 2021 42
56 Guildwood (140) 2021 48
150 Woodbine-Lumsden (60) 2021 51
113 Princess-Rosethorn (10) 2021 54
88 Markland Wood (12) 2021 58
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

back to the top

Crime Statistics for 2022¶

No description has been provided for this image
NEIGHBOURHOOD_158 OCC_YEAR Total_Count
139 West Humber-Clairville (1) 2022 1146
93 Moss Park (73) 2022 702
156 York University Heights (27) 2022 694
36 Downtown Yonge East (168) 2022 690
152 Yonge-Bay Corridor (170) 2022 604
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

back to the top

NEIGHBOURHOOD_158 OCC_YEAR Total_Count
56 Guildwood (140) 2022 53
9 Bayview Woods-Steeles (49) 2022 59
150 Woodbine-Lumsden (60) 2022 67
4 Avondale (153) 2022 70
64 Humber Heights-Westmount (8) 2022 73
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

back to the top

Crime Statistics for 2023¶

No description has been provided for this image
NEIGHBOURHOOD_158 OCC_YEAR Total_Count
139 West Humber-Clairville (1) 2023 1371
156 York University Heights (27) 2023 847
36 Downtown Yonge East (168) 2023 790
93 Moss Park (73) 2023 770
152 Yonge-Bay Corridor (170) 2023 717
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

back to the top

NEIGHBOURHOOD_158 OCC_YEAR Total_Count
150 Woodbine-Lumsden (60) 2023 58
78 Lambton Baby Point (114) 2023 68
56 Guildwood (140) 2023 83
64 Humber Heights-Westmount (8) 2023 83
88 Markland Wood (12) 2023 96
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

Comparison of Actual Data and the Predicted Data of 2024¶

back to the top

<ipython-input-163-525f6f4de7e7>:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  predicted_crime_2024.loc[: , 'Predicted_Total_Count'] = y_voting_TC_pred
NEIGHBOURHOOD_158 HOOD_158 OCC_YEAR OCC_MONTH Total_Count Predicted_Total_Count
36 West Humber-Clairville (1) 1 2024 1 110 113.0
37 West Humber-Clairville (1) 1 2024 2 101 108.0
38 West Humber-Clairville (1) 1 2024 3 79 110.0
39 West Humber-Clairville (1) 1 2024 4 93 110.0
40 West Humber-Clairville (1) 1 2024 5 104 110.0
... ... ... ... ... ... ...
6630 South Eglinton-Davisville (174) 174 2024 2 15 11.0
6631 South Eglinton-Davisville (174) 174 2024 3 7 14.0
6632 South Eglinton-Davisville (174) 174 2024 4 17 13.0
6633 South Eglinton-Davisville (174) 174 2024 5 12 13.0
6634 South Eglinton-Davisville (174) 174 2024 6 13 12.0

948 rows × 6 columns

Overview: This chart compares the actual and predicted crime counts for each of the first six months of 2024.

   OCC_MONTH  Total_Count  Predicted_Total_Count
0          1         3518                 3189.0
1          2         3185                 3008.0
2          3         3207                 3340.0
3          4         3201                 3406.0
4          5         3471                 3732.0
5          6         3101                 3676.0
No description has been provided for this image
  • Key Observations:
    • The predictions closely match the actual data for most months, though some months (e.g., January and May) have slightly higher predicted values than actual.

Visualizations based on the Predictions¶

back to the top

array([1, 2, 3, 4, 5, 6])
NEIGHBOURHOOD_158 HOOD_158 OCC_YEAR OCC_MONTH
0 West Humber-Clairville (1) 1 2024 7
1 West Humber-Clairville (1) 1 2024 8
2 West Humber-Clairville (1) 1 2024 9
3 West Humber-Clairville (1) 1 2024 10
4 West Humber-Clairville (1) 1 2024 11
... ... ... ... ...
6630 South Eglinton-Davisville (174) 174 2027 8
6631 South Eglinton-Davisville (174) 174 2027 9
6632 South Eglinton-Davisville (174) 174 2027 10
6633 South Eglinton-Davisville (174) 174 2027 11
6634 South Eglinton-Davisville (174) 174 2027 12

6635 rows × 4 columns

back to the top

NEIGHBOURHOOD_158 HOOD_158 OCC_YEAR OCC_MONTH Total_Counts
0 West Humber-Clairville (1) 1 2024 7 111
1 West Humber-Clairville (1) 1 2024 8 111
2 West Humber-Clairville (1) 1 2024 9 108
3 West Humber-Clairville (1) 1 2024 10 110
4 West Humber-Clairville (1) 1 2024 11 107
... ... ... ... ... ...
6630 South Eglinton-Davisville (174) 174 2027 8 12
6631 South Eglinton-Davisville (174) 174 2027 9 13
6632 South Eglinton-Davisville (174) 174 2027 10 14
6633 South Eglinton-Davisville (174) 174 2027 11 14
6634 South Eglinton-Davisville (174) 174 2027 12 15

6635 rows × 5 columns

Anticipated Crime Statistics for next six months of 2024¶

Overview: This graph depicts the predicted crime trend for the last half of 2024 (July to December).

back to the top

No description has been provided for this image

Key Observations:

  • Crime activity increases from July to August and November but decreases significantly in September and December.
  • The fluctuation in crime patterns can be explained by various factors such as seasonal changes, public events, or social patterns.

Anticipated Total count of Crime Acitivities for upcoming three years¶

OCC_YEAR Total_Counts
0 2025 42175
1 2026 42175
2 2027 42167

Overview: This chart shows the predicted total crime counts for 2025, 2026, and 2027.

back to the top

No description has been provided for this image

⁠Key Observations:

  • The total crime count is expected to remain stable over the three years, with only slight variations.
No description has been provided for this image

Anticipated Crime Statistics for Upcoming Years with a Month Breakdown¶

back to the top

OCC_YEAR OCC_MONTH Total_Counts
0 2025 1 3189
1 2025 2 3008
2 2025 3 3340
3 2025 4 3406
4 2025 5 3732
5 2025 6 3676
6 2025 7 3659
7 2025 8 3711
8 2025 9 3575
9 2025 10 3632
10 2025 11 3685
11 2025 12 3562
12 2026 1 3189
13 2026 2 3008
14 2026 3 3340
15 2026 4 3406
16 2026 5 3732
17 2026 6 3676
18 2026 7 3659
19 2026 8 3711
20 2026 9 3575
21 2026 10 3632
22 2026 11 3685
23 2026 12 3562
24 2027 1 3189
25 2027 2 3008
26 2027 3 3332
27 2027 4 3406
28 2027 5 3732
29 2027 6 3676
30 2027 7 3659
31 2027 8 3711
32 2027 9 3575
33 2027 10 3632
34 2027 11 3685
35 2027 12 3562

⁠Overview: This chart focuses on the predicted crime counts for each month of 2025-2027.

back to the top

No description has been provided for this image

Key Observations:

  • Similar to 2024, there are fluctuations across months with potential peaks in certain months like May and August

Overview: This heat map visualizes the number of crime incidents by month and year, predicting data for 2025 through 2027.

•⁠ ⁠The color gradient ranges from light green, indicating fewer incidents, to dark blue, representing higher crime counts.

back to the top

No description has been provided for this image

⁠Key Observations:

  • The darkest blue sections (in July and August) suggest that the mid-year months may consistently have higher crime rates across all years.
  • There is a noticeable drop in December each year, possibly indicating seasonal trends.

Anticipated Crime Statistics for 2025¶

back to the top

No description has been provided for this image
NEIGHBOURHOOD_158 OCC_YEAR Total_Counts
139 West Humber-Clairville (1) 2025 1322
156 York University Heights (27) 2025 789
36 Downtown Yonge East (168) 2025 760
93 Moss Park (73) 2025 747
152 Yonge-Bay Corridor (170) 2025 679
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

back to the top

NEIGHBOURHOOD_158 OCC_YEAR Total_Counts
150 Woodbine-Lumsden (60) 2025 93
78 Lambton Baby Point (114) 2025 95
64 Humber Heights-Westmount (8) 2025 113
107 Old East York (58) 2025 113
19 Broadview North (57) 2025 115
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

Anticipated Crime Statistics for 2026¶

back to the top

No description has been provided for this image
NEIGHBOURHOOD_158 OCC_YEAR Total_Counts
139 West Humber-Clairville (1) 2026 1322
156 York University Heights (27) 2026 789
36 Downtown Yonge East (168) 2026 760
93 Moss Park (73) 2026 747
152 Yonge-Bay Corridor (170) 2026 679
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

back to the top

NEIGHBOURHOOD_158 OCC_YEAR Total_Counts
150 Woodbine-Lumsden (60) 2026 93
78 Lambton Baby Point (114) 2026 95
64 Humber Heights-Westmount (8) 2026 113
107 Old East York (58) 2026 113
19 Broadview North (57) 2026 115
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

Anticipated Crime Statistics for 2027¶

back to the top

No description has been provided for this image
NEIGHBOURHOOD_158 OCC_YEAR Total_Counts
139 West Humber-Clairville (1) 2027 1322
156 York University Heights (27) 2027 789
36 Downtown Yonge East (168) 2027 760
93 Moss Park (73) 2027 747
152 Yonge-Bay Corridor (170) 2027 679
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

back to the top

NEIGHBOURHOOD_158 OCC_YEAR Total_Counts
150 Woodbine-Lumsden (60) 2027 85
78 Lambton Baby Point (114) 2027 95
64 Humber Heights-Westmount (8) 2027 113
107 Old East York (58) 2027 113
19 Broadview North (57) 2027 115
No description has been provided for this image
Make this Notebook Trusted to load map: File -> Trust Notebook

Summary and Conclusion¶

Similarities of the charts would be due to:

  • Seasonality in Crime: Crime follows seasonal trends, with increases in warmer months (like May) and stabilization during summer and fall.
  • Prediction Model Consistency: The model likely uses historical data, showing that crime patterns remain stable year over year, leading to similar predictions.
  • Data Patterns: The algorithm emphasizes recurring annual trends, with lower crime rates in colder months and peaks in spring and early summer.

In conclusion, our project provided valuable insights into Toronto’s crime trends over the past few years, highlighting seasonal patterns, neighborhood-specific crime rates, and the challenges of predictive modelling with real-world data. We hope these insights can help us better understand crime patterns and improve safety in the city.

back to the top